Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks

نویسندگان

  • J. T. Stevanak
  • Lincoln D. Carr
چکیده

We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with N independent words as vertices or nodes, and edges or links allotted to words occurring within m places of a given vertex; we call m the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance m. The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution P (k) and clustering coefficient C(k); we also studied the mean geodesic distance, and found all our texts were small-world networks. We observed a natural break-point at k = √ N where the power law in the degree distribution changed, leading to separate power law fit for the bulk and the tail of P (k). Our linear discriminant analysis yielded a 73.8± 5.15% accuracy for the correct classification of novels and 69.1 ± 1.22% for news stories. We found an optimal word distance of m = 4 and a minimum text length of 100 to 200 words N .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

The System of Engagement in a Sample of Prose Fiction and the News

Emerging within Systemic Linguistics, Appraisal/Evaluation is a framework for analyzing the language of evaluation, providing techniques for the systematic analysis of evaluation and stance as they operate in whole texts and in groupings of texts. There are three systems in the Appraisal framework: Attitude, Engagement, and Graduation. This study sets out to analyze the use of the system of Eng...

متن کامل

Distinguishing between Positive and Negative Opinions with Complex Network Features

Topological and dynamic features of complex networks have proven to be suitable for capturing text characteristics in recent years, with various applications in natural language processing. In this article we show that texts with positive and negative opinions can be distinguished from each other when represented as complex networks. The distinction was possible by obtaining several metrics of ...

متن کامل

Local Derivative Pattern with Smart Thresholding: Local Composition Derivative Pattern for Palmprint Matching

Palmprint recognition is a new biometrics system based on physiological characteristics of the palmprint, which includes rich, stable, and unique features such as lines, points, and texture. Texture is one of the most important features extracted from low resolution images. In this paper, a new local descriptor, Local Composition Derivative Pattern (LCDP) is proposed to extract smartly stronger...

متن کامل

Pattern Recognition in Control Chart Using Neural Network based on a New Statistical Feature

Today for the expedition of the identification and timely correction of process deviations, it is necessary to use advanced techniques to minimize the costs of production of defective products. In this way control charts as one of the important tools for the statistical process control in combination with modern tools such as artificial neural networks have been used. The artificial neural netw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1007.3254  شماره 

صفحات  -

تاریخ انتشار 2010